本次更新有几点:

1 Q&A

1.1 Q1

相关字段确定及公式更新

  • 销售单价(pm_price):客户购买时页面售价(不同顾客可能不一样,部分顾客的价格可能是去掉优惠之后的会员价);
  • 优惠金额总计(marketing):市场部优惠券金额+促销满减金额+促销满赠金额+使用账户余额金额+MP商家优惠券金额+官网运营费+类目运营费+人事部(289)+药师(290)+分摊后的激励金+使用返利余额金额+其他部门优惠券金额;
  • 顾客实际付款金额(gmv) :商品数量 * 销售单价 + 运费金额 + 处方费 - Marketing;
  • 市场部优惠券金额:发放给用户的市场部优惠券在订单中使用的金额;
  • 促销满减金额:参加满减活动商品在订单中扣减的金额;
  • 促销满赠金额:参加满赠活动商品在订单中扣减的金额;
  • 使用账户余额金额:在订单中使用的账户中余额支付的金额;
  • MP商家优惠券金额:MP商家发放的优惠券在订单中使用的金额;
  • 毛利额(去税)(gp_amt):根据dlvy_date出库时间、biz_model销售模型(分为自营和MP),分别对应不同的公式;
    • 2020.1.1之前:自营:GMV÷(1+销售税率)-移动平均价(去税) *商品数量 ; MP:GMV*0.03÷(1+销售税率)+处方费÷1.06-(处方费÷3)*0.3
    • 2020.1.1-2020.12.31: 自营:GMV÷(1+销售税率)-移动平均价(去税) *商品数量 ; MP:GMV*0.05÷(1+销售税率)+处方费÷1.06-(处方费÷3)*0.3
  • revenue:根据dlvy_date出库时间、biz_model销售模型(分为自营和MP),分别对应不同的公式
    • 2020.1.1之前:自营:GMV÷(1+销售税率) ; MP:GMV*0.03÷(1+销售税率)+处方费÷1.06
    • 2020.1.1-2020.12.31: 自营:GMV÷(1+销售税率) ; MP:GMV*0.05÷(1+销售税率)+处方费÷1.06
  • 下单口径:ordr_time, 支付口径:pay_date, 出库口径:dlvy_date

1.2 Q2

why user_id is missing in some orders?

后台下单(手工单)时,没有user_id,药师后台代下单, 三方运营重下三方订单(天猫、京东等)。

1.3 Q3

why ordr_detl_id and ordr_id are not unique?

pm_list_id是唯一的,购买数量比较多的药品时,ordr_detl_id可能会被拆成几个pm_list_id。下面会有pm_list_id的唯一性分析。

1.4 Q4

Why some orders can be found in the order table, but some others cannot?

已提供部分订单号给公司验证

1.5 Q5

many variables have negative or missing values, such as pm_num (order quantity), pm_use_coupn_amt, revenue, and gp_amt

部分字段出现负值完全是因为退货订单(rma_yn=1),以下有相关分析。

1.6 Q6

Why an order not cancelled can be invalid?

已提供部分订单号给公司验证

1.7 Q7

部分字段的真实取值跟dictionary对不上

已申请新的字段解释,正在审批中

1.8 Q8

部分字段废弃

revenue_freight、gp_amt_product 这两个字段废弃。

1.9 Q9

1)如果一个人买了又退,是不是有两条不同order_id的订单,一条rma_yn为0,一条为1?

同一个人的退货订单是向表里增加记录,会导致重复,更新pt_d时间,但是退货订单的商品数量、金额可能与原订单不同,顾客可能只退一部分货。

1.10 Q10

dictionary更新

向dictionary中增加了2个表 - 111标准指标库.csv:使用sheet:1药网,该表给出了各种需要计算的字段的公式,以此为准。 - 所有表字段值解释.csv : 该表给出了多个数据库表的字段值解释,比如B2C订单中订单状态A2N,A2Z,A2A分别表示什么含义

2 Summary of B2C_orders

Add a column Nminus

summaryData = function(dat){
    writeLines(paste("Number of obs:", nrow(dat)))
    names = colnames(dat)
    flags = matrix(NA,length(names),12)
    colnames(flags) = c("NAs","Zeros","NZs",'Mean','SD',"Min",'Median',"Max","Unique","class", 'Size', 'Nminus')
    rownames(flags) = names
    for(name in names){
        ix = dat[, !is.na(get(name))]
        val = dat[ix, get(name)]
        if(all(unique(val) %in% c('True','False'))) val = val=='True'
        if(class(val[1])[1] == 'logical') val = as.numeric(val)
        flags[name,1:3] = c(sum(!ix),sum(val==0),sum(val!=0))
        if(class(val[1])[1] %in% c("numeric","integer")){
            flags[name,4:8] = round(c(mean(val),sd(val),min(val),median(val),max(val)),digits=2)
            flags[name, 12] = sum(val<0)
        }
        flags[name, 9] = length(unique(val))
        flags[name, 10] = class(val[1])[1]
        flags[name, 11] = object.size(val) / (1024^2) # MB
    }
    rm(val)
    return(flags)
}
orders = readRDS('E:/University of Connecticut/Peng, Jing - 111/data/cleaned/orders.RDS')
orders = orders %>% mutate_if(is.numeric , replace_na, replace = 0)
orders$dlvy_date = as.POSIXct(orders$dlvy_date, format="%Y-%m-%d", tz='UTC')
orders$ordr_time = as.POSIXct(orders$ordr_time, format="%Y-%m-%d", tz='UTC')
# ZZ = summaryData(orders)
ZZ = fread('E:/University of Connecticut/Peng, Jing - 111/data/Guan/summaryData_orders.csv')
datatable(ZZ, filter = 'top', options = list(pageLength = 15), caption = 'Table 1: The summary of B2C orders.')

3 Verify whether the formulas is correct

3.1 Revenue formula √

  • revenue:根据dlvy_date出库时间、biz_model销售模型(分为自营和MP),分别对应不同的公式
    • 2020.1.1之前:自营:GMV÷(1+销售税率) ; MP:GMV*0.03÷(1+销售税率)+处方费÷1.06
    • 2020.1.1-2020.12.31: 自营:GMV÷(1+销售税率) ; MP:GMV*0.05÷(1+销售税率)+处方费÷1.06
# dlvy_date NA值共有1917235行
orders[biz_model != 1, cal_revenue := (abs(gmv)/(1+abs(taxrate)))] #自营
orders[dlvy_date <= as.POSIXct('2020-1-1', format="%Y-%m-%d", tz='UTC') & biz_model ==1, cal_revenue := (abs(gmv)*0.03/(1+abs(taxrate))+ abs(prescription_amount)/1.06)] # MP, 2020.1.1之前
orders[dlvy_date > as.POSIXct('2020-1-1', format="%Y-%m-%d", tz='UTC') & biz_model ==1, cal_revenue := (abs(gmv)*0.05/(1+abs(taxrate))+ abs(prescription_amount)/1.06)] # MP, 2020.1.1之后
orders[is.na(dlvy_date) & biz_model ==1, cal_revenue := (abs(gmv)*0.03/(1+abs(taxrate))+ abs(prescription_amount)/1.06)] # MP, dlvy_date is NA
orders[, isEqual_R := ifelse((abs(revenue)-abs(cal_revenue))<1,1,0)] 
orders[,table(isEqual_R)]
isEqual_R
       0        1 
  178365 14694304 

3.2 Marketing formula √

Marketing = 市场部优惠券金额(mkt_coupn_amt)+促销满减(pm_pmtn_amt)+促销满赠(pmtn_full_amt)+使用账户余额金额(use_acct_amt)+MP商家优惠券金额(mp_coupn_amt) + 官网运营费(of_web_coupn_amt) +类目运营费(cat_coupn_amt)+人事部(hr_coupn_amt)+药师费(pamst_coupn_amt)+分摊后的激励金(intv_pay)+使用返利余额金额(use_re_sum_amt)+其他部门优惠券金额(空)

orders[, cal_marketing := (mkt_coupn_amt+pm_pmtn_amt+pmtn_full_amt+use_acct_amt+mp_coupn_amt+of_web_coupn_amt+cat_coupn_amt+hr_coupn_amt+pamst_coupn_amt+intv_pay+use_re_sum_amt)][, isEqual_M := ifelse((abs(marketing)-abs(cal_marketing))<1,1,0)]
orders[,table(isEqual_M)]
isEqual_M
       0        1 
   21043 14851626 

3.3 Gp_amt formula √

  • 毛利额(去税)(gp_amt):根据dlvy_date出库时间、biz_model销售模型(分为自营和MP),分别对应不同的公式;
    • 2020.1.1之前:自营:GMV÷(1+销售税率)-移动平均价(去税) *商品数量 ; MP:GMV*0.03÷(1+销售税率)+处方费÷1.06-(处方费÷3)*0.3
    • 2020.1.1-2020.12.31: 自营:GMV÷(1+销售税率)-移动平均价(去税) *商品数量 ; MP:GMV*0.05÷(1+销售税率)+处方费÷1.06-(处方费÷3)*0.3
orders[biz_model != 1, cal_gp := (abs(gmv)/(1 + abs(taxrate)) - abs(move_avg_prc_notax)*abs(pm_num))]# 自营
orders[dlvy_date <= as.POSIXct('2020-1-1', format="%Y-%m-%d", tz='UTC') & biz_model ==1,cal_gp := (abs(gmv)*0.03)/(1+abs(taxrate)) + abs(prescription_amount)/1.06 - (abs(prescription_amount)/3)*0.3]# MP, 2020.1.1之前
orders[dlvy_date > as.POSIXct('2020-1-1', format="%Y-%m-%d", tz='UTC') & biz_model ==1,cal_gp := (abs(gmv)*0.05)/(1+abs(taxrate)) + abs(prescription_amount)/1.06 - (abs(prescription_amount)/3)*0.3]# MP, 2020.1.1之后
orders[is.na(dlvy_date) & biz_model ==1,cal_gp := (abs(gmv)*0.03)/(1+abs(taxrate)) + abs(prescription_amount)/1.06 - (abs(prescription_amount)/3)*0.3]# MP, dlvy_date is NA
orders[, isEqual_G := ifelse((abs(gp_amt)-abs(cal_gp))<1,1,0)]
orders[,table(isEqual_G)]
isEqual_G
       0        1 
  371492 14501177 

3.4 act_sale & gmv

顾客实际付款金额(gmv) :商品数量 * 销售单价 + 运费金额 + 处方费 - Marketing;

orders[, cal_gmv := abs(pm_num)*abs(pm_price) + abs(freight) + abs(prescription_amount) - abs(marketing)]
orders[, isEqual_GMV := ifelse((abs(gmv)-abs(cal_gmv))<1,1,0)]
orders[, table(isEqual_GMV)]
isEqual_GMV
       0        1 
  295614 14577055 
orders[,table((abs(act_sale)-abs(gmv)) < 1)]

   FALSE     TRUE 
  946046 13926623 
orders[,table((abs(act_sale)-abs(cal_gmv)) < 1)]

   FALSE     TRUE 
 1231282 13641387 

4 The relationship between features and wrong values calculated by the formula

4.1 ordr_time 下单时间

tc=orders[,.(wrong_revenue=length(which(isEqual_R==0)),wrong_marketing=length(which(isEqual_M==0)),wrong_gp_amt=length(which(isEqual_G==0))),by=ordr_time]
long_tc = melt(tc,id='ordr_time')
long_tc$value = log(long_tc$value)
long_tc$ordr_time = as.Date(long_tc$ordr_time)
p = ggplot(data = long_tc,aes(x=ordr_time,y=value,group=variable, color=variable))+
  geom_line()+
  labs(x = "Time", y = "Wrong value Count(log)") + 
  scale_x_date(date_breaks="1 months",date_labels="%b /%m")+ 
  theme(axis.text.x = element_text(angle=45, hjust=1, vjust=1))
ggplotly(p)

4.2 ordr_stus 订单状态

tc=orders[,.(wrong_revenue=length(which(isEqual_R==0)),wrong_marketing=length(which(isEqual_M==0)),wrong_gp_amt=length(which(isEqual_G==0))),by=ordr_stus]
long_tc = melt(tc,id='ordr_stus')
long_tc$value = log(long_tc$value)
ggplot(data = long_tc,aes(x=ordr_stus,y=value,group=variable, color=variable))+
  geom_line()+
  labs(x = "ordr_stus", y = "Wrong value Count(log)")+
  scale_x_discrete(breaks=c("A2K", "A2N", "A2T", "A2Z", "PNP", "PNS"),
                   labels=c("已拆单", "订单已取消", "用户已签收", "订单完成", "人工审核不通过", "订单完成(人工审核)"))

4.3 sale_type_id 销售类型

tc=orders[,.(wrong_revenue=length(which(isEqual_R==0)),wrong_marketing=length(which(isEqual_M==0)),wrong_gp_amt=length(which(isEqual_G==0))),by=sale_type_id]
long_tc = melt(tc,id='sale_type_id')
long_tc$value = log(long_tc$value)
ggplot(data = long_tc,aes(x=sale_type_id,y=value,group=variable, color=variable))+
  geom_line()+scale_x_continuous(breaks=long_tc$sale_type_id)+
  labs(x = "sale_type_id", y = "Wrong value Count(log)")

4.4 biz_model 业务模式

tc=orders[,.(wrong_revenue=length(which(isEqual_R==0)),wrong_marketing=length(which(isEqual_M==0)),wrong_gp_amt=length(which(isEqual_G==0))),by=biz_model]
long_tc = melt(tc,id='biz_model')
long_tc$value = log(long_tc$value)
ggplot(data = long_tc,aes(x=biz_model,y=value,group=variable, color=variable))+geom_line()+scale_x_continuous(breaks=long_tc$biz_model)+labs(x = "biz_model", y = "Wrong value Count(log)")

5 The relationship between features and minus values

5.1 rma_yn =1为退货订单

We found gp_amt and pm_num are negative when ram_yn equals to 1.

tc = orders[,
            .(Nminus_gp_amt = length(which(gp_amt<0)),
              Nminus_pm_num = length(which(pm_num<0)),
              Nminus_cost_hastax = length(which(cost_hastax<0)),
              Nminus_revenue = length(which(revenue<0)),
              Nminus_gmv = length(which(gmv<0)),
              Nminus_act_sale = length(which(act_sale<0)),
              Nminus_acc_sale = length(which(acc_sale<0))),
            by =rma_yn]
long_tc = melt(tc,id='rma_yn')
long_tc$value = log(long_tc$value)
ggplot(data = long_tc,aes(x=rma_yn,y=value,fill=variable))+
    geom_bar(position = 'dodge',stat = 'identity')+
    labs(x = "rma_yn", y = "Minus value Count(log)")+
    scale_x_continuous(breaks=long_tc$rma_yn)+
    scale_fill_brewer(palette = "Set3")

5.2 ordr_time 下单时间

tc = orders[,
            .(Nminus_gp_amt = length(which(gp_amt<0)),
              Nminus_pm_num = length(which(pm_num<0)),
              Nminus_cost_hastax = length(which(cost_hastax<0)),
              Nminus_revenue = length(which(revenue<0)),
              Nminus_gmv = length(which(gmv<0)),
              Nminus_act_sale = length(which(act_sale<0)),
              Nminus_acc_sale = length(which(acc_sale<0))),
            by =ordr_time]
long_tc = melt(tc,id='ordr_time')
long_tc$value = log(long_tc$value)
long_tc$ordr_time = as.Date(long_tc$ordr_time)
p = ggplot(data = long_tc,aes(x=ordr_time,y=value,group=variable, color=variable))+
  geom_line()+
  labs(x = "Time", y = "Minus value Count(log)")+ 
  scale_x_date(date_breaks="1 months",date_labels="%b /%m")+ 
  theme(axis.text.x = element_text(angle=45, hjust=1, vjust=1))
ggplotly(p)

5.3 ordr_stus 订单状态

tc = orders[,
            .(Nminus_gp_amt = length(which(gp_amt<0)),
              Nminus_pm_num = length(which(pm_num<0)),
              Nminus_cost_hastax = length(which(cost_hastax<0)),
              Nminus_revenue = length(which(revenue<0)),
              Nminus_gmv = length(which(gmv<0)),
              Nminus_act_sale = length(which(act_sale<0)),
              Nminus_acc_sale = length(which(acc_sale<0))),
            by =ordr_stus]
long_tc = melt(tc,id='ordr_stus')
long_tc$value = log(long_tc$value)
ggplot(data = long_tc,aes(x=ordr_stus,y=value,group=variable, color=variable))+
  geom_line()+
  labs(x = "ordr_stus", y = "Minus value Count(log)")+
  scale_x_discrete(breaks=c("A2K", "A2N", "A2T", "A2Z", "PNP", "PNS"),
                   labels=c("已拆单", "订单已取消", "用户已签收", "订单完成", "人工审核不通过", "订单完成(人工审核)"))

5.4 sale_type_id 销售类型

tc = orders[,
            .(Nminus_gp_amt = length(which(gp_amt<0)),
              Nminus_pm_num = length(which(pm_num<0)),
              Nminus_cost_hastax = length(which(cost_hastax<0)),
              Nminus_revenue = length(which(revenue<0)),
              Nminus_gmv = length(which(gmv<0)),
              Nminus_act_sale = length(which(act_sale<0)),
              Nminus_acc_sale = length(which(acc_sale<0))),
            by =sale_type_id]
long_tc = melt(tc,id='sale_type_id')
long_tc$value = log(long_tc$value)
ggplot(data = long_tc,aes(x=sale_type_id,y=value,group=variable, color=variable))+
  geom_line()+
  scale_x_continuous(breaks=long_tc$sale_type_id)+labs(x = "sale_type_id", y = "Minus value Count(log)")

5.5 biz_model 业务模式

tc = orders[,
            .(Nminus_gp_amt = length(which(gp_amt<0)),
              Nminus_pm_num = length(which(pm_num<0)),
              Nminus_cost_hastax = length(which(cost_hastax<0)),
              Nminus_revenue = length(which(revenue<0)),
              Nminus_gmv = length(which(gmv<0)),
              Nminus_act_sale = length(which(act_sale<0)),
              Nminus_acc_sale = length(which(acc_sale<0))),
            by =biz_model]
long_tc = melt(tc,id='biz_model')
long_tc$value = log(long_tc$value)
ggplot(data = long_tc,aes(x=biz_model,y=value,group=variable, color=variable))+
  geom_line()+
  labs(x = "biz_model", y = "Minus value Count(log)")+
  scale_x_continuous(breaks=long_tc$biz_model, labels = long_tc$biz_model)

6 The relationship between rma_yn and other features

6.1 rma=1 trend

rma_orders = orders[,.(rma_count = log(length(which(rma_yn==1)))), by=ordr_time]
rma_orders$ordr_time = as.Date(rma_orders$ordr_time)
p = ggplot(data = rma_orders,aes(x=ordr_time,y=rma_count))+
  geom_line(colour='#FF4500')+
  labs(x = "Time", y = "Return order Count(log)")+ 
  scale_x_date(date_breaks="1 months",date_labels="%b /%m")+ 
  theme(axis.text.x = element_text(angle=45, hjust=1, vjust=1))
ggplotly(p)

6.2 pm_num

商品数量

orders[,table(rma_yn, pm_num<0)]
      
rma_yn    FALSE     TRUE
     0 14732118        0
     1        0   140551

6.3 gp_amt

毛利额

orders[,table(rma_yn, gp_amt<0)]
      
rma_yn    FALSE     TRUE
     0 13345935  1386183
     1    18353   122198

6.4 cost_hastax

含税成本

orders[,table(rma_yn, cost_hastax<0)]
      
rma_yn    FALSE     TRUE
     0 14732118        0
     1     4548   136003

6.5 gmv

顾客实际付款金额

orders[,table(rma_yn, gmv<0)]
      
rma_yn    FALSE     TRUE
     0 14730505     1613
     1     8015   132536

6.6 revenue

gmv去税

orders[,table(rma_yn, revenue<0)]
      
rma_yn    FALSE     TRUE
     0 14729398     2720
     1     8041   132510

6.7 marketing

优惠金额总计

orders[,table(rma_yn, marketing<0)]
      
rma_yn    FALSE     TRUE
     0 14732115        3
     1   119766    20785

7 退货情况下,判断订单是否有重复

7.1 pm_list_id重复次数分布

重复出现次数从2到6,可以考虑剔除重复3次及以上的pm_list_id对应的订单

dup_pm = tidyfst::count_dt(orders,pm_list_id) %>% filter_dt(n>1)
dup_pm[,table(n)]
n
     2      3      4      5      6 
157136   3872     29      5      8 

7.2 pm_list_id相同时,个例分析

pm_list_id相同时,很多字段也会出现不相同的情况,如gp_amt, gmv, pm_num, acc_sale, act_sale,可能是退了一部分货,举例说明

instance = orders%>%filter(pm_list_id==11166138524)
datatable(instance,extensions = 'FixedColumns', options = list(scrollX=TRUE,fixedColumns = TRUE))

7.3 判断pm_list_id出现两次时,是不是分别为退货订单和正常订单

check_list = dup_pm[n==2,unique(pm_list_id)]
orders[pm_list_id %in% check_list,table(rma_yn)]
rma_yn
     0      1 
184230 130042 

8 不同类别药品销量按月份变化趋势

按照gmv(用户实付金额)来计算销量

orders[, table(gmv>0)]

   FALSE     TRUE 
  882002 13990667 
orders[,table(cat1_id)]
cat1_id
       0      103      107      108   953710   955306   955405   960464 
      13        5  1532958       20 10875747   594628   271672   303542 
  962285   964106   971591   972052   972103 
  681885   599758      253     3798     8390 
orders[,table(cat1_name)]
cat1_name
参茸花茶、休闲零食           成人用品               家电           美妆个护 
                 5             594628                 20             255655 
          门店专用       维生素、钙剂           药城专用           药妆个护 
               253             681885               8390              16017 
          医疗器械 医疗器械、健康电器           医药拓展           隐形眼镜 
            273074            1259884                 71             303542 
          营养保健             云药房           中西药品           滋补保健 
               174               3727           10875747             599584 
length(unique(orders$cat2_id))
[1] 115
length(unique(orders$cat3_id))
[1] 720

8.1 cat1_name (use ggplot)

按照一级类目名称分类

trend = orders[gmv > 0, .(total_sales = log(sum(gmv))), by = .(ordr_time_cut = cut(ordr_time, breaks = "month"), cat1_name)]
trend$ordr_time_cut = as.Date(trend$ordr_time_cut)
getPalette = colorRampPalette(brewer.pal(5, "Set1"))
p = ggplot(data = trend,aes(x=ordr_time_cut,y=total_sales,group=cat1_name,color=as.factor(cat1_name)))+
  geom_line()+
  geom_point()+
  labs(x = "Time", y = "Sales(log) by category 1")+ 
  scale_x_date(date_breaks="1 months",date_labels="%b /%m")+ 
  theme(axis.text.x = element_text(angle=45, hjust=1, vjust=1))+
  scale_color_manual(name = "Cat1_name", values = getPalette(length(unique(trend$cat1_name))))
ggplotly(p)

8.2 cat1_id (use dygraphs)

按照一级类目ID分类

trend = orders[gmv > 0, .(total_sales = log(sum(gmv))), by = .(ordr_time_cut = cut(ordr_time, breaks = "month"), cat1_id)]
trend$ordr_time_cut = as.Date(trend$ordr_time_cut)
trend_d = dcast(trend,ordr_time_cut~cat1_id,value.var = "total_sales")
dygraph(trend_d, main = "Sales(log) by cat1_id")%>%
  dyRangeSelector(dateWindow = c("2018-12-01", "2020-08-01"))%>%
  dyOptions(colors = RColorBrewer::brewer.pal(6, "Set1"))%>%
  dyOptions(drawPoints = TRUE, pointSize = 2) %>%
  dyEvent("2020-1-23", "COVID-19", labelLoc = "bottom")%>%
  dyLegend(show = "follow")%>%
  dyHighlight(highlightSeriesOpts = list(strokeWidth = 3)) %>%
  dyCSS(textConnection("
     .dygraph-legend > span { display: none; }
     .dygraph-legend > span.highlight { display: inline; }
  "))

8.3 cat2_name(use ggplot)

按照二级类目名称分类

trend = orders[gmv > 0, .(total_sales = log(sum(gmv))), by = .(ordr_time_cut = cut(ordr_time, breaks = "month"), cat2_name)]
trend$ordr_time_cut = as.Date(trend$ordr_time_cut)
getPalette = colorRampPalette(brewer.pal(5, "Set1"))
split_list = split(unique(trend$cat2_name),1:10)
plotlist = list()
for(i in 1:length(split_list)){
    p = ggplot(data = trend[cat2_name %in% split_list[[i]]],aes(x=ordr_time_cut,y=total_sales,group=cat2_name,color=as.factor(cat2_name)))+
        geom_line()+
        geom_point()+
        labs(x = "Time", y = "Sales(log) by cat2_name")+ 
        scale_x_date(date_breaks="1 months",date_labels="%b /%m")+ 
        theme(axis.text.x = element_text(angle=45, hjust=1, vjust=1))+
        scale_color_manual(name = "cat2_name", values = getPalette(length(split_list[[i]])))
    plotlist[[i]] = print(ggplotly(p))
}
htmltools::tagList(setNames(plotlist, NULL))

8.4 cat2_id (use dygraphs)

按照二级类目ID分类

trend = orders[gmv > 0, .(total_sales = log(sum(gmv))), by = .(ordr_time_cut = cut(ordr_time, breaks = "month"), cat2_id)]
trend$ordr_time_cut = as.Date(trend$ordr_time_cut)
trend_d = dcast(trend,ordr_time_cut~cat2_id,value.var = "total_sales")
total_num = length(unique(trend$cat2_id))
split_list = split(x=2:total_num,f=1:11)
myfun = function(cut_list){
  tmp = c(1,split_list[[cut_list]])
  dygraph(trend_d[,..tmp], main = "Sales(log) by cat2_id",group = "My group")%>%
      dyRangeSelector(dateWindow = c("2018-12-01", "2020-08-01"))%>%
      dyOptions(colors = RColorBrewer::brewer.pal(6, "Set1"))%>%
      dyOptions(drawPoints = TRUE, pointSize = 2) %>%
      dyEvent("2020-1-23", "COVID-19", labelLoc = "bottom")%>%
      dyLegend(show = "follow")%>%
      dyHighlight(highlightSeriesOpts = list(strokeWidth = 3)) %>%
      dyCSS(textConnection("
         .dygraph-legend > span { display: none; }
         .dygraph-legend > span.highlight { display: inline; }
      "))
}
res = lapply(1:length(split_list), function(i) myfun(i))
htmltools::tagList(res)